An example of how optimizing for short-term rewards can weaken
from 2023-02-14 Organize the top page leads
An example of how optimizing for short-term rewards can weaken
@tsukammo: I'm having trouble explaining why Life Optimization doesn't work, game tree search.
https://gyazo.com/597878edc889a3c2489d01be73177041
@tsukammo: This is what happens with an evaluation function based on direct rewards alone, so a common " lifehacks" are optimizing the evaluation function with "curiosity" or "prepare a reward by chopping in small steps".
Yeah, I know all that. I just don't.
Trade-offs between use and exploration
---
This page is auto-translated from /nishio/短期的報酬に最適化すると弱くなる例 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.